In [1]:
import pandas as pd
data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
data.head(1)
Out[1]:
In [2]:
data.columns
Out[2]:
In [3]:
data['Do you celebrate Thanksgiving?'].value_counts()
Out[3]:
In [4]:
boolean = (data['Do you celebrate Thanksgiving?'] == 'Yes')
data = data.loc[boolean]
data['Do you celebrate Thanksgiving?'].value_counts()
Out[4]:
In [5]:
data['What is typically the main dish at your Thanksgiving dinner?'].value_counts()
Out[5]:
In [6]:
filt_tofur = data[data['What is typically the main dish at your Thanksgiving dinner?'] == 'Tofurkey']
filt_tofur['Do you typically have gravy?']
Out[6]:
In [7]:
apple_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
ate_pies.value_counts()
Out[7]:
In [8]:
print(data['Age'].value_counts())
In [9]:
def convert(string):
if pd.isnull(string) == True:
return None
else:
string = string.split(' ')[0]
answer = string.replace('+', '')
return int(answer)
data['int_age'] = data["Age"].apply(convert)
data['int_age'].describe()
Out[9]:
Despite the fact that we have rounded off the age, the data reflects the present proportions of age groups (the same proportions as in [9])
In [10]:
def income(string):
if pd.isnull(string) == True:
return None
else:
string = string.split(' ')[0]
if string == 'Prefer':
return None
string = string.replace('$', '')
string = string.replace(',', '')
return int(string)
data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(income)
data['int_income'].describe()
Out[10]:
In [11]:
print(data['How much total combined money did all members of your HOUSEHOLD earn last year?'].value_counts())
When analyzing this result, it should be borne in mind that we took only the lower value of each group. Also, we have a very high standard deviation (59 thousand dollars) with the step size of each group of 25 thousand.
In [12]:
data[data['int_income'] < 50000]['How far will you travel for Thanksgiving?'].value_counts(normalize=True)
Out[12]:
In [13]:
data[data['int_income'] >= 150000]['How far will you travel for Thanksgiving?'].value_counts(normalize=True)
Out[13]:
Only 38 percent of the respondents (with an income of 50000 thousand) celebrate at home. At the same time, 47 percent of respondents (with an income of 150 and above) celebrate at home. Despite not much difference, we can assume that students prefer to celebrate a holiday in the home of parents who earn more (and accordingly, celebrate in their own home).
In [15]:
table = pd.pivot_table(data, values='int_age', index='Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns='Have you ever attended a "Friendsgiving?"')
table
Out[15]:
In [16]:
table_income = pd.pivot_table(data, values='int_income', index='Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns='Have you ever attended a "Friendsgiving?"')
table_income
Out[16]:
It appears that young people prefer to meet up with friends on Thanksgiving and try to attend a "Friendsgiving? (We can make this conclusion from pivot tables: people who do this have less age (33.9) and income(66019))
In [18]:
none1 = data['Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply. - Other (please specify)'].value_counts()
none2 = data['Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply. - Other (please specify).1'].value_counts()
In [19]:
none1
Out[19]:
In [20]:
none2
Out[20]:
In [21]:
data['Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply. - Peach cobbler'].value_counts()
Out[21]:
In [ ]: